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NAGPUR UNIVERSITY 


DR. M. A. CHANSARKAR 
МІСЕ CHANCELLOR 


FOREWORD 


My brother Dr. B.A. Chansarkar from Middlesex Business 
School, U.K. who is an eminent scholar, was a Visiting Pro- 
fessor in Nagpur University in the Post-Graduate Teaching 
Pepartment of Statistics during 1980-81. During this period, 
on the request from the University he conducted five enlighten- 
ing staff seminars on "Applications of Multivariate Techniques 
in various Situations” which were highly appreciated by one 
and all. This book is an enlarged and improved version of the 
same. 


Indian Statisticians have been playing a dominant role in 
the development of statistical theory, especially in the field of 
Multivariate Analysis. Over the last thirty years, Multivariate 
Analysis has been playing an important role in all aspects of 
statistical interpretation. This has now been further strengthen- 
ed by the development and usage of computors as a part of 
educational technology. With the developing computer 
technology, extensive use is made which was hitherto difficult 
due to time consuming manual calculations. With easy ac- 
cessibility to advance computer packages, multivariate techni- 
ques are being commonly used in all disciplines. 


It has become an essential research tool to tackle problems 
of complicated nature-both scientific ari soc ial. Modern 


facilities for extensive data collection and storing are easily 
available and this has further encouraged the use of 
multivariate techniques. The applications of multivariate 
techniques has now become a common feature of national 
planning, business modelling, psychometric analysis and all 


aspects of research, both in educational institutions and in- 
dustry. 


The present book by Dr. B.A. Chansarkar is one of the first 
attempts of its kind in this highly specialised field. | am sure this 
publication will go a long way in helping students, teachers as 
well as businessmen and industrialists, in developing an in- 
depth understanding of use of multivariate techniques, as they 


have immensely helped the Faculty Members of Nagpur Univer- 
sity. 


Nagpur M.A. CHANSARKAR 
September 1987 


PREFACE 


This book on Applied Multivariate Analysis is based on a 
series of staff seminars conducted by me at the Post-Graduate 
Department of Statistics at Nagpur University, Nagpur (India) 
while | went there as a Visiting Professor. During this period, | 
having experienced the method of teaching and realising the 
theoretical approach to the subject in India, felt the need for 
this kind of a book elucidating the applications of these techni- 


ques. 


This objective is achieved through this book to some ex- 
tent by concentrating on only those techniques which are com- 
monly used. | hope, the book will be of immense value to the 
students of Statistics, Economics and all those engaged in 
rasearch either for higher degree or in Industry. 


My sincere thanks are due to Nagpur University and its 
Faculty members because of whose generous help and 
assistance the Visiting Professorship & seminars were made 


possible. 


Ordinarily, my brother, Dr. M.A. Chansarkar, Vice- 
Chancellor, Nagpur University would neither expect nor would 
like to be thanked for writing the Foreword to this publication. 
However, | will be failing in my duty, if | would not express my 


gratitude for the same. 


Enfield, UK B.A. CHANSARKAR 


August 1987 
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1. Introduction 


1.1 Multivariate statistical methods are widely used іп the 
last decade or two because they make it possible to encompass 
all the data from an investigation in one analysis. This approach 
results in a clearer, better organised account of the investiga- 
tion than the piecemeal analysis of the parts of the data which 
are often observed in behavioural studies. It also permits more 
realistic probabilistic statements in hypothesis testing and in- 
terval estimation than do separate analysis. Such a comprehen- 
sive approach necessitates a foresight and organising ability on 
the part of the researcher and requires of successful investigation 
a proper use of statistical models to focus thinking and express 
hypotheses succinctly. Such models, in behavioural studies, 
tend to be multivariate since many response variables are 
observed simultaneously. 

The main contributors for the development of multivariate 
methods have been Н. Hotellings, R.A. Fisher, S.S. Wilks, S.N. 
Roy, M.S. Bartlett, C.R. Rao, and T.W. Anderson. The 
availability of computers to perform the laborious computa- 
tions required in multivariate analysis when the number of 
variables exceeds 2 or 3, has helped in development of 
statistical theory and its wide applications in various fields, 
such as marketing, industrial and business analysis, attitude 
measurements, agriculture and behavioural sciences. 

The type of analysis (see Appendix A) one can perform 
depends heavily on the available or collected information. It is 
in this context the type of measurement and its realiability is 
quite important. There are basically two main types of 
(measurements) scales-a) metric b) non-metric. These types 
are determined, both by the empirical operations involved in the 
process of measuring and the mathematical properties of the 


scale. 
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1.2 The four classes of measurements (1) are nominal, or- 
dinal, interval and ratio. The first two are non-metric and other 
two are metric. 

Nominal (Classificatory): 
Crudest form of measurement. 
Measurement at weakest level. 
Statistíc relevant: Mode 
Ordinal (Ranking): 
Rank order. 
Statistic relevant: Median 
Percentile 
Spearman's r 
Interval (Cardinal): 
Distances between any two numbers 
on the scale are of known size. 
Unit of measurement arbitrary. 
Quantitative scale. 
Statistic relevant: Mean 
Standard deviation 
Pearsons r 
Multiple R 
Ratio: 
Zero as origin. 
Rarely used in marketing conditions. 
Statistic relevant: Geometric Mean 
Coefficient of variation 

1.3 Sample selection and the material (universe) from 
which the sample is drawn is quite important. The sample 
should be drawn randomly, independently from а large popula- 
tion. The reliability of the sample estimate is based on the stan- 
dard error of the corresponding estimate, e.g., standard error of 


Sample mean S: = E 
n 


The most commonly used measure of location is the 


Л А Ух 
arithmetic mean x= "RE though certain situations may de- 


mand use of other measures (median, mode). The Squareroot of 


; 1 к 
variance (s.d. 5, — \/ m E (x-x)2 and covariance Sxy = 
1 =, I 
A ГЕ (хх) (y — v] are the most commonly used measures 
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or absolut 


of dispersion, though the mean deviation z 
mean deviation may be suitable in some situations. Wit 
several measurements being recorded on an individual diffe. 
rent scales (units) are used and the variance — covarianci 
matrix is dependent on the units of measurement. Therefore 
in many situations Pearson's product correlation coefficient 


Z (xx) (у-у) аге used. г can vary оп! 


xy — - = 
М (x-x)2 ZB (y- y? 


between +1. When ‘г = + 1, all points lie exactly о 
a straight line and there is perfect relationship and when г = с 
there is no /inear relationship. This property is very useful in ir 
terpreting r. Further r2, the coefficient of determination, | 
useful for interpretation purposes as it tells us the percentage с 
the variation in y is explained by the variation in x (and vice 
versa). External validation is necessary to know whether th 


relationship is casual one or not. 


г 
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2. Regression Analysis 


2.1 Dependence analysis with one variable being depen- 
dent and the rest independent. 

Assumption (1): One of the standard assumptions of 
regression analysis is that the model which describes the data 
is linear. The relationship (as seen from the scattergram) may 
be non-linear. There are several non-linear models which by ap- 
propriate transformations!) can be made linear (see Appendix 
B). 

Assumption (2): The other standard assumption is con- 
stancy of error variance. When the error variance is not cons- 
tant over all the observations, the error is said to be 
heteroscedastic. There are transformations(2) which stabilise 
the variance and also have the effect of making the distribution 
of the transformed variable closer to the normal distribution 
(see Appendix C). 


The data consist of n observations on a dependent 
variable y and p independent (explanatory) variables x4, x», 


Let Хот X11 + Xp Yi u1 Bo 
B 

X02 X12 ....Хр2 у2 u2 а 

х=|` г : у= |. ,u-|: land ® = 92 
Хоп Х12...Хрп Yn Un % 


Тһе x's are non-stochastic, i.e. values of x's are fixed апа 
x's are measured without error. 


The linear model representing the data is 
у=х%+и where Xo; = 1 for all i. 
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The assumptions made about u for the least squares 
estimator are 


E(u) = O, Var (u) = E(uu’) = о21һ 


i.e., u's аге independent, have zero mean апа constant 
variance. 


This implies E(y) = xR. 

The least square estimate of ($ is obtained by minimising 
the sums of squares of deviations from their expected values. 

S($) = uu = (у-хӘ/ (у-х0). 

Minimum of S(®) leads to the System of equations 

(x x)b = x^y (called Normal equations) 

.. b = (xx)-1x'y iff | xx | * 0. 

The vector of predicted values of У corresponding to y is y 
- xb. 

The vector of residuals ise = у-ў = y-xb. 


2.2 The properties of the least Squares estimators are 


(1) b is an unbiased estimator of B, with variance-covariance 
matrix V(b) which is 


Vib) = E(b-8) (b-B) = о2(х^х)-1 = 02. 
where с = (х” х)-1 апа E(b) = B 
(2) The unbiased estimator of 02 is S2 where 


, СС LEPE E A 
eg e'e zc SIME) аа УА) b'x^y 
п-р-1 п-р-1 п-р-1 


With added assumption uis are normally distributed, we have 


(3) Vector b has P variate normal distribution with mean B and 
variance o2.. The marginal distribution of bi is normal with 
mean Bi and variance c2,; where сі is the ith diagonal element 
ofc. 
3 e'e 
(4) The quantity W = — * is x2 1 
02 Bee 


(5) b and 52 are distributed independently of each other. 


2.3 The adequacy of the li i ulti: 
sit candies bbe de near model is made by the multi 
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Ely; - 92 ‚ 
о И Шаша К E 
E(y;- y? Ely; - y)2/n S? 


The estimates will be unbiased estimators if corrected for 
the degrees of freedom. 


So R? =1- — 


R? enables one to compare separate regression equations 
with different number of pre-determined variables X in the equa- 
tions. It is this aspect of total variation explained enables one to 
choose the variables to be included in the model. 

2.4 Forward Stepwise regression analysis or Backward 
Elimination procedure: 

In forward stepwise regression analysis, the first variable in- 
cluded in the equation is the one with the highest correlation 
coefficient. If b is significantly different from zero, the first 
variable is retained and search is made for the second variable. 
Variable with the highest simple correlation coefficient with the 
residuals from step one is then included. If b for the second 
variable is significantly different from zero, it is retained and 
search is made for the third one in the same manner. The cut 
off comes when the last variable entering has non-significant 
regression coefficient or all the variables are included in the 
equation. The backward elimination procedure is exactly the 
reverse. You start with full equation with all the variables in- 
cluded and successively drop one variable at a time. The 
variables are eliminated on the basis of the contribution to the 
reduction of error sums of squares. 

Dummy variables are used in a variety of ways and may be 
considered whenever there are qualitative factors affecting a 
relationship, e.g., sex, marital status, political affiliation etc. 
One assigns a value zero or one to the variable according to 
whether the respondent has no characteristic or possesses the 
characteristic under investigation. 


2.5 Application of Regression Analysis. 

In order to assess the contribution of different factors af- 
fecting the foodgrains production in Indial3), linear regression 
analysis was performed over a period of 26 years 1951-1977, 
using the following set of variables, 
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у = Total production of foodgrains (Million Tonnes) 
X1 — Public sector outlay at constant 1961-62 prices 
оп Agriculture and related sectors (Rs. Crores) 
— Weather conditions O when normal 
1 otherwise (draughts/ 
and or floods) 


x 
N 
1 


X3 = Availability of fertilisers (000 Tonnes) 
X4 — Gross area irrigated under foodgrains (M. Hectares) 
X5 = Gross area Unirrigated under foodgrains 

(M. Hectares) 
Xe — Net Imports of foodgrains (M. Топпев) 


Proxy for “дар between requirement and availability’ 
X; = Wholesale price index of foodgrains (1961-62 = 
100) Derived Series, 
The equation with all variables was 
у <-105.28 + 0.005 X1=2.77 х + 0.004 хз + 2.23 x4 
(0.01) (1.53) (0.01) (0.95) 


+ 1.45 x; — 0.85 x6 + 0.01 x; 
(0.23) (0.47) (0.06) 


Figures in brackets are Standard errors of coefficients. 
R2 = 0.97 R? = 0.96 


4 Forward Stepwise regression analysis was performed to 
judge the importance of variables to include in the model. The 
final equation being 


У 5115.19 = 3.03) x, 4. 3.33 x4 + 1.29 x5 — 0.83 x6 


(1.51) (0.20) (0.21) (0.29) 
weather area area imports 
irrigated unirrigated 
Figures in brackets are Standard errors of coefficients. 
R2 =0.96 В? = 0.96 


It is interesting to note that the two vari 
ariables x x 
area—contribute Аз 


90% of the total variation. 


2.6 The Problem of correlated errors (Autocorrelation) 


pgs 


"n 


e 
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When the observations have a natural sequential order, the 
correlation is referred to as autocorrelation. The presence of 
autocorrelation has the following effects: 

(1) Least squares estimates are unbiased but not effi- 
cient in the sense they no longer have minimum 
variance. 

(2) The estimates of 02 and the standard errors of the 
regression coefficients may be seriously understated, 
giving spurious impression of accuracy. 

(3) Thes confidence intervals and various tests of 
significance commonly used would be no longer 
valid. 

Two types of autocorrelation can occur in practice. Firstly, 
autocorrelation in appearance due to omission of a variable that 
should be in. Once the variable is uncovered the problem is 
resolved. Secondly, it could be pure autocorrelation. Correction 
involves transformation of data.4) 

Detection of Autocorrelation by Durbin-Watson statistic. 
The amount of autocorrelation that exists in the residuals is 
measured by the Durbin-Watson statistic. 


Errors constitute first order autoregression series. 
U = pUr1 + e [е |<1 


G= Independent and Normally distributed with zero mean 
and constant variance. 


n 
E (px - 9-1)? 
be the test statistic. 


Let d = 
р 


Гг Іс 


2 
t 
for Ho: p = 0 

Hi:p >O 


We estimate parameter p by r where 
n 
Еее; – 1 

2 approximately. 


Ж 2 
е 
к 


а 2/2 (1- n. 
d changes between 0 and 4. 
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dOifr=1 = 
а= 21#г= 0 
d = 4ifr =-1 
deviation of d from numerical value 2 indicates autocorrelation. | 
Rule: if d «d, Reject Ho 
d>dy do not reject Но 
duy« d «d, the test is inconclusive. 
г; 
1 
о E RT CET еи 14-9) а 
Positive Іпсопсіш: No Auto- Шаа Negative f 
Auto- sive ı Correlation |Clusive! Auto. А 
Correlation : i | correlation 
Example 
Consumer expenditure (Yr) and money stock (xı) in U.S.A., 
1952 to 1956, quarterly data, Units of measurement billion 
current dollars. 
у = —154.700 + 2.300 x, 
(19.85) (0.115) 4 
(Figures іп brackets are Standard errors of the estimates) 
R2 - 0,955 
d - 0.328 dı = 1.28 forn = 20 


d Significant at 1% level. 
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Cochrane-Orcutt/5) have suggested the following method 
using transformation to remove autocorrelation: 
Procedure is: use У,-ру, | and х,-рх-1 instead of 
y: and xi. 


Estimate p by first ordinary least squares and use the errors 
to get p. 
For the USA data above, 6 = 0.874 
using this У, = —324.44 + 2.758 X, 
ü (0.44) 


(Figures in brackets is standard errors of the estimate) 
d - 1.607 accepted at 596 level of significance since 
dy — 1.49. 


2.7 Multicollinearity: The phenomenon of mutual linear 
dependence between the explanatory variable X; is called 
multicollinearity(6). When there is complete absence of linear rela- 
tionship among the explanatory variables, they are said to be 
orthogonal. It affects statistical inferencel7) and forecasting!8!. 


Indication of multicollinearity that appear as instability in 
the estimated coefficients are as follows: 
(1) Large changes in the estimated coefficients when a 


variable is added or deleted. 
(2) Large changes in the estimated coefficients when a 


data point is altered or dropped. 
Once the residual plots indicate that the model has been 
satisfactorily specified, multicollinearity may be present if 
(3) The algebraic signs of estimated coefficients do not 
conform to prior expectations or 
(4) Coefficients of variables that are expected to be im- 
portant have large standard errors. 
To overcome multicollinearity use Principal Components 
(see Chapter 4 for further details). 
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3. Discriminant Analysis 


Ф 


3.1 Dependence analysis but more than one dependent. 


The problem of classification arises when an investigator 
makes a number of measurements on an individual and wishes 
to classify the individual to either one or the other of the 
groups. Consider the situation where you are interested in at- 
tempting to discriminate new Ford and Chevrolet buyers!!). 
The characteristics used are personality needs, socio-economic 
variables and a combination of both. Similarly as a Bank 
Manager you want to decide whether to advance a loan or not? 
Discriminant analysis is useful in situations where a total sam- 
ple is divided into known groups based on some classificatory 
variable and the researcher is interested in understanding the 
group differences or in predicting correct belonging to a group 
of a new sample based on the information on a set of predictor 
variables. Other examples of discriminatory analysis being 
among listeners who sent for a programme guide from those 
who did not(2), and effective new product decisions for super- 
markets(3) and types of holders of savings accounts(4). 

3.2 In constructing the procedure of classification, it is 
desired to minimise the probability of misclassification. The 
space is to be divided into regions R4 and R2 such that ex- 
pected loss is as small as possible. The Bayes procedure is 
usually used — it is a minimax procedure if the maximum ex- 
pected loss is a minimum(5). 

A discriminant function is a linear function of the set of 
observations weighted by the inverse of the variance- 
covariance matrix. The linear function has the greatest variance 
between samples relative to the variance within samples. !t is 
thus analogous to one way classification of analysis of 


variance(6). 
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Age 


o Good Risk 
x Bad Risk 


Sex 


Consider two multiv. 
Covariance matrices, 


= (uf ti) 


ariate normal 


Populations with equal 
namely, М(ш1), =) 


and М(ш2), =), where pli) 
кр) ‚ is the vector of means of the ith population, 


i- 1,2. and Dis the variance-covariance matrix of each popula- 
tion. Further, let x be ап observation. We wish to classify x to 
either of the population. 


SEL. MCA 521 1 
The discriminant function is x E ( à | 


and if the two population y likely and Costs of 
misclassification are equal, then if the discriminant function 
has value greater than a scalar (= у, (HO) + 12) xii (шШ1)-,2)) 


then the observation x belongs to the first Population, other- 
Wise to the second population. 


= 412 
S are equall 


XE. x2 42m + x!) g qm =X) > 
log K 


n 
1 E 
where (ny + n2-2)S = E op x0) (x, Ky? ү 


1 
(x42). 2) (x, 2. (2 y 
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3.3 Once the discriminant function is obtained, the func- 
tion could be tested for significance. Then, the exact classifica- 
tion of all the individuals in the sample is worked out. The pro- 
portion of correct classification is then compared against what 
could have been predicted by chance without any knowledge 
of the scores on the predictor variables(7). However, to avoid 
biases, it is more appropriate to validate the analysis by using 
the discriminant weights on another sample of individuals, 
though sometimes it is preferable to split the sample into half, 
using one half for analysis and the other half for validation. 

In а certain bank, in order to advance loans to prospective 
applicants the characteristics of previous 52 clients chosen 
randomly have been studied over a period of two years and in 
the given localities. It is known that of these 25 were good risk 
(returned the loan as per the terms) and 27 were bad risk. Infor- 
mation on the following characteristics was studied: 


х1 = Sex O = Male, 1 = Female 

хә = Number of years of service 

хз = Number of children 

X4 = Net weekly income adjusted for taxes 
Xg = Average weekly rent/mortgage. 


Preliminary analysis indicates variables x, and xg are not 
significantly different and overall discrimination is not satisfac- 
tory. Further run with only 3 remaining variables X5, x4 and хь 
gave the following results: 

Means 


Good Risk Bad Risk Matrix of corrected sums of 
squares and products 
7.40 5:95 20.96 421.92 148.24 
87.84 35.67 88897.36 9135.09 
36.44 21.93 6090.12 


The discriminant function is 


3.032 x; + 0.012 x4 + 0.032 x5 221.90 
Then the Population is of good risk. 


An individual with 6 years of service, weekly net income of 
£31.0 and mortgage of £23 week is а bad risk (as the discrimi - 
nant function has the value of 19.91) and will not be advanced 


the loan. 
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The classification of al 


| the individuals gave the following 
result: 
Observed 
Good Bad 
Expected Good 23 6 29 
(By discriminant Bad 2021 23 
analysis) 


The Fishers exact test(6) has probability less than 0.05 and 
hence the null hypothesis of independence is rejected. 


“© 
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4. Principal Component Analysis 


4.1 Interdependence analysis with metric inputs. 

In the analysis of interdependence, there is no one variable 
or variable subset that is the focus of study that differs in im- 
portance from the others. A variable or set of variables is not to 
be predicted from the others or explained by them. The goal, 
rather, is to give meaning to a set of variables or objects. 


When variables are related, they can be made orthogonal 


(and therefore completely independent) by the method of ргіп- 
cipal component analysis. 


Every linear regression model can be restated in terms of a 
set of orthogonal explanatory variables. They are referred to as 
the principal components of the explanatory set of variables. 


Principal components are linear combinations of variables 
Which have special properties in terms of variances, e.g. the 
first principal component is the normalised linear combination 
(i.e, the sums of squares of coefficients being one) with max- 
imum variance. The principal components turn out to be the 
characteristic vectors of the covariance matrix(1), 


Suppose X has p variables (measurements) with 
covariance matrix E. The actual distribution of X is irrelevant 
except for the covariance matrix, however, if X is normally. 


distributed more meaning can be given to the principal com- 
ponents. 


Let C be a p component column vector such that (ê 
The variance of CX is 
E(C’X)2 = E(C/XX'C) = СЕС (1) 


То determine the normalised linear combination CX with 


maximum variance, we must find a vector C satisfying C/C — T 
which maximises (1). 


Principal Component Analysis 19 


Let Ф = C/EC-A (C'C—1 where А is the lagrange 


"E ôb . 
multiplier. The vector of partial derivatives TU is 
i 


-- -2ЕС-22С (2) 

бе 

Then (Z—a DC = 0 (3) 
(4) 


and since C’S = 1, [== АП =0 


(4) hasproots, A, > A32 >ХАр>0 


Consider, the orthogonal transformation U = C'X 


The U4,Us,.....- Up, the elements of U are the principal com- 
ponents. U, corresponds to the maximum eigen value A, and is 
called the first component, U2 is the second and so on. The 
total variation remains the same, even after transformation 


from X to U. 


4.2 Properties of the principal components 
(1) Number of components same as number of variables, 
p in our case. 
(2) The new variables (i. 
(3) Components come ou 
nding order. (Importa 
ined). 
The principal components are based on covariance matrix 
E. Often, the sample has mixed variables which are measured 
in different units. For example, the psychologist standardises 
the various tests first. In such situations there is justification for 
standardising throughout and converting the covariance matrix 
E to a correlation matrix R. The disadvantage is that if the 
original variables are р = variate normal, the now standardised 
observations are not, or rather they are approximately normal 
as m becomes large. Another difficulty is that, this device 
distorts the original measurements which, for example, took 
due account of the fact the head is a portion of a much larger 
body. In the standardised scale the head and the body length 
are on equal footing. The principal components are obtained 
from the eigenvalues of R. Some information contained in X is 
lost but the lost information is kept to a minimum), 


e. components) are uncorrelated. 


t in order of importance in a desce- 
nce measured by variation expla- 
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4.3 The two main uses of principal component analysis are 
(1) Getting independent explanatory set of variables(3), 
(2) Reduction in number of explanatory variables(4),(5), 
Example 
Consider the data concerning the import activity of the French 
есопоту/(6) over a period of 1949 to 1966. 
y — Import 

Ху = Domestic Production 
Stock formation 
Domestic Consumption. 


x 
© 
uou 


АП variables in Milliards of French Francs. 
Multiple Regression analysis gives 


У = 19.730 + 0.032х  0.414x; + 0.243x, 
(4.125) (0.187) (0.322) (0.285) 


(Figures in brackets are standard errors of coefficients) 
R2 = 0.973 n= 18 
F3:14 = 168.45 


The model is not well specified and it is known this is par- 
tially due to France's trade with EEC in 1960. 


The correlation matrix R being 


Xi X2 Xa 
1 0.026 0.997 g 
R= 1 0.036 ре 
1.00 9x 


The principal components are 


0.7063Х + 0.0435X + 0.7065х; 
-0.0357X; + 0.990X; - 0.0258X; 
-0.7070X, - 0.0070Х; + 0.7072X5 


ELT u2 us 


б, 0 0 6's аге variances of ui's and 
5 0 eigenvalues of R. 


How U 


Uu 
12 
Us 
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If 6's are all equal unity, the original variables аге 
orthogonal. 


. If 6; = 0 exactly, there is perfect linear relationship among 
original variables — an extreme case of multicollinearity. 


If one of the à; is much smaller than the other (and near 
zero), multicollinearity is present. 


8, = 1.999, 59 = 0.998, ӧз = 0.003 


Since 6’s are variances of the principal components, if à is 
approximately zero, the corresponding component is approx- 


imately equat to zero. 


Неге 63 = 0.003 = 0 из is constant and mean of us is 
zero. 


i.e., Ug = - 0.7070X, - 0.0070X5 + 0.7072X3 = 0 
dd Хз - Хі 


This is consistent with Tux, - 0.997. 


Estimate ®, and B3 by doing regression. 


y = BoX2 + (Bı =B3) Xa 
where X4 = X4 + Хз 
y = 0.612X, + 0.086X, —9.007 


(0.109) 0.003) 
(Figures in brackets are standard errors of the estimates) 
R2 = 0.987 


Both regression coefficients positive and significant. 
The final equation then is 


y = 0.086X, + 0.612X2 + 0.086X3 = 9.007 


gh 


С 9 fr Е,С.Е.к.т., veo Senga 
Dato . // wa 


Aec. Mo... 6234. 
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5. Factor Analysis 


ned with resolution of set of 
mall number of categories or 
orrelations of the 
Il the essential in- 


5.1 Factor Analysis is concer 
descriptive variables in terms ofas 
factors. This is achieved by analysis of interc 
variables. The factors this generates convey a 
formation of the original set of variables. 


Factor analysis differs from principa 
in two respects. First, the variables are assumed to be 


analysable into a small sét of factors and an error term. This er- 
ror term does not appear in principal components. A position of є 
associated with variable j is often identified as being a specific 
factor. The remaining factors are then termed common factors. 
The second difference involves the process of rotative factors 
to new orthogonal or even non-orthogonal axes, if such a rota- 
tion will improve the interpretability of the resulting factors. 


І component analysis 


Suppose X = Af tu + € 
where f — m-component vector o 
factors 

— fixed vector of means 

— vector of (non-observable) errors 

- (pxm) matrix of factor-loadings (m <р) 


f (non-observable) 


^ 


where f is random 
we assume E (f) = O, E le) = O, 


E(ft’) = M, E (ee) = V and diagonal and 
E (fe) = O 
Е(Х) = А 


and Cov (ХІ = E (X-u) Cu = AMA + y 


24 


Thus, %-г ғу 


2 
n 12 T13 — Gn 
Correaltion r n2 е 
В = 21 2 T23 — l2n 
Matrix 
2 |n 
T21 Tn2 rna — ha 


where hp are communalities (unknown), and are obtained 
with the help of inter-correlations. As a first estimate one can 
take square of multiple correlation coefficient of that variable 
with all the other original x variable. 


Taking all variables in standard form (О mean and unit 
variance) 


Zj = аур + арыны + ajmFm + aj ej 


The sums of square of common factors coefficients is 


Placing ones in the diagonal of the matrix presumes that 
the variance is partitioned only among common factors obtain- 
ed by doing principal component analysis. It is this concern 


which has led to the use of other values) than one in the 
diagonal. 


5.2 The choice of m — the number of factors — is quite im- 
portant. When a large number of unorganised set of variables is 
factored, the analysis will extract the largest and the most in- 
teresting combinations of variables first and then proceed to 
smaller combinations. Carrying the analysis too far (more than 7 
factors) has two penalties. It is exceedingly wasteful on com- 
puter time and it obscures the meaning of the findings because 
it affects the rotation adversely. 


Four stopping criteria may be employed. When the analyst 
already knows enough about his data so that he knows how 
many factors are actually there, he can have the analysis stop- 
ped after that number of factors has been extracted. Second- 
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ly, if he has a clear idea in advance about the amount of 
variance the factors can explain, he can stop when that 
criterion is reached. Most commonly, however, if he does not 
know very much about his data to begin with, he will want to 
keep factoring until the factors get small and meaningless. The 
third one is an incremental approach. After a first set of factors 
has explained a large percentage of variation, say 70 per cent, 
if the next factor adds only a small percentage to the total 
variance, say less than five per-cent, it may be discarded and 
we could stop factoring. The final criterion is most objective. It 
states that atl factors whose eigenvalues are greater than one 
when a correlation matrix is factored can be considered as 
significant and meaningful factors. The last two criteria are 


statistical in nature. 


5.3 The two commonly used methods of extracting factors 


are Principal factor solution (Centroid Method) and Maximum 
likelihood solution.(2 The principal factor solution is simply the 
application of the mathematics of principal component analysis 
(Chapter 4.1) to the reduced correlation matrix, i.e., the one with 


communalities (hf ) in pla 
tion has the important property of the principal component 
analysis inasmuch that it produces factors in the order of the 
amount of variation they explain. It also tends to produce 
bipolar factors with some high negative loadings as well as 
some high positive loadings. Nowdays, the method of max- 
imum likelihood solution is becoming increasingly available. 
This is an efficient method of extracting factors which does not 


require advance knowledge of the communalities (he ) but does 
need to be told the number of factors to be produced by the 


ce of one’s. The principal factor solu- 


analysis. 
To get over the pro 
set of factor loadings, 


blem that analysis gives more than one 
methods have been developed to pro- 
duce solutions which are unique in a certain sense. These 
methods are known as rotations. It is important to note that by 
changing the factor axes, the basic structure of the data, as 
found by the factor analysis, does not get altered by rotation. 
The new loadings can often give much better meaning or inter- 
pretation to the data. We rotate the factors to simplify the pat- 
tern of factor loadings so that they have the properties of 'simple 
structure'(3). These properties are: 

(1) Each row of the factor loading matrix should have at 

least one zero. 
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(2) If there are m common factors, each column of the 
factor loadings matrix should have at least m zeroes. 

(3) For every pair of columns of the factor loadings 
matrix there should be several variables whose en- 
tries vanish in one column but not in the other. 

(4) For every pair of column of the factor loadings matrix 
there should be only a number of variables with high 
loadings in both columns. 


Variable Factors (loadings) 
1 


у Бу e = Fi 


line indicates values present. 


The two commonly used methods of rotations consistent 
with simple structure are 'varimax' and 'promax'. The varimax 
rotation is one of the so-called orthogonal (uncorrelated) rota- 
tions, which means that the axes are kept at right angles to each 
other. On the other hand, promax rotation is an example of obli- 
que (or correlated) rotation where the axes are not only rotated 
but may be bent towards each other to produce factors which 
are correlated. With batteries of semantic rating scales, the 
‘promax’ method usually gives a slightly 'tidier', more inter- 
pretable solution than the varimax method, but basically the 
same factor structurel4), 


5.4 The applications of factor analysis up to the present 
time have been mainly in the field of psychology, because the 
methods were invented by psychologists for dealing with cer- 
tain of their problems. They are planning experiments employ- 
ing factor analysis to determine a small number of tests to 
describe the human mind as completely as possible. The 


е 
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methods of factor analysis have been successfully applied in re- 
cent years in varying fields as political science, business and 
medicine. Mukherjee(5) conducted a factor analysis of fourteen 
measures of individual coffee preferences (such as, pleasant 
versus unpleasant flavour, cheap versus expensive taste etc) 
as a basis for determining what dimension, if any, underlay the 
original measures, and hence could serve as a better basis for 
understanding as well as analysising individual preferences. 
The original fourteen measures were reduced to four underlying 
factors. Stoetzel(6) obtained three factors based on rankings of 
various types of liquors (rum, whisky, etc) by а sample of 
French consumers. After looking at factor loadings, he labelled 
them as sweetness, price and regional popularity factors, bas- 
ed on his external knowledge about the liquor industry. 

By means of factor analysis, the computation of multiple or 
partial correlations, regression coefficients can be greatly 
simplified when many variables are involved. This is usually 
done by identifying likely variables from a much larger set of 
variables(7). Twedt'8) isolated three variable based on a factor 
analysis of 19 predictor variables (various aspects of adver- 
tisements such as size, colour, layout etc) and the criterion 
variable of readership. He then used these three variables for 
predicting readership by doing multiple regression. In a study of 
household brand proneness!9) for 44 specific grocery products 
a decision was made to delete the age of husband as an in- 
dependent variable in the analysis because it had an extremely 
high correlation with the wife's age. Similarly Massy(10) in stu- 
dying the variation in the television ownership in 240 urban 
areas conducted a factor analysis of 27 variables, the percen- 
tage distribution of households in 14 income categories, 9 
education categories and 4 measures of T.V. coverage. Of 
these 27 variables, no more than 10 resulting new measures 


were utilised in subsequent regression analysis. 

Factor analysis is also used to obtain factor scores of in- 
dividuals in the sample for further analysis. This not only 
reduces the large sets of data to a manageable level but also 
removes the collinearity in the original variables. Farley(11) used 
factor scores to explain variability in brand loyalty across pro- 


ducts. 
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6. Cluster Analysis 


теі ay on KU the general procedure by which we 
БЕКЕТКЕ 29 нрава Gunes oe the basis of their 
general properties of the obj Eae «овоше те 
jects called clustering of variables 
— V analysis(2) or group together the objects into types or 
classes, called the clustering of objects — O analysis(3). 

6.2 The basic criterion for cluster analysis technique is that 
clusters should be within-group similar and between-group dif- 
ferent. There are three main measures of similarity: 

i) Distance measures 
ii) Correlation measures 
iii) Similarity measures designed for attribute data. 

6.2.1 Distance Measures: Distance measures tend to be 
exclusively Euclidean distance D 
D = yix — xg? + (y1 = v2? Small D indicates 

closeness. 

Distances among the objects in a given cluster will ap- 
Ргоасһ zero but the interspace differences between objects in 
different clusters will not pe zero. Centres of density are 
located by working with the distance matrix. 

Two problems exist with the use of Euclidéan distance 


measure: (1) Correlated characteristics of the variables and (2) 
of the original units іп Which the 


the non-comparability 

characteristics are measured. The second is usually ‘solved’ by 

Standardising all the characteristics to mean zero and unit stan- 

dard deviation. Thus it is d that the mean and variance 

топа characteristics is not importan И betes 
e first problem can be handled ^Y usi 

analysis Chapter 4) on the characteristics and the factor scores 
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computed for the objects. Each component score may then be 
weighted by the square root of the eigenvalue associated with 
that component before computing the distance measure. In 


practice, distance measure of this kind is usually used when the 
data are at least intervally scaled. 


6.2.2 Correlation measures: The correlation coefficient as a 
measure of similarity between two objects is widely used. 
Completely different results are obtained in cluster analysis if 
one uses the correlation coefficient instead of Euclidean 
distance measure, and one should attach different interpreta- 
tion to the results. Distances (D) measure differences between 
people more powerfully than the correlation measure. Correla- 
tions, however, will measure patterns in responses, regardless 
of the distance between patterns. The correlation coefficient 
can be considered as a type of distance measure. 

If you consider two sectors V and U, the distance between 
them is given by 


D- A| (v - Uf (V-U) 
= A (VV + UU — 2VU) 


If V and U are scaled to zero mean and unit length, then 


D-N 2-2r. 


Three problems are related to this technique. There is loss 
of information as the correlation removes the elevation and 
scatter of each object. Some objects may be split among 
clusters when one uses the factor loadings for grouping ob- 
jects. Finally, the analyst must usually resort to an variable 
analysis to interpret the clusters' characteristics according to 
their correlations with.the underlying factors. 


6.2.3 Similarity Measures (Attribute data): Similarlity 
measures are useful when the characteristics of each object are 
only nominally scaled, which is often the case of attribute data. 
The usual notion of distance is less applicable here, however, it 
is still possible using what is known as multi-dimensional scal- 
ing (discussed later in chapter 7). The similarity measure in the 
match coefficient obtained by attribute matching. 


If two objects are compared on each of the 10 attributes, 
with the following results: 
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Attribute 
Object (eves АБИ 6- 7. 8 9 10 
1 OR ete ON Ол кт: 1 0 
2 0 1 1 0 1 0 1 1 1 1 


The fractional match coefficient is given Бу the number of 
like entries in the columns, divided by the number of columns, 
e.g. 4/10. If weak matches (non-possession of attribute) are to 
be deemphasiz&d, one can use modified match coefficient, us- 
ing only the possession of attribute as a factor in the ratio, in 
this example 3/9. 


Similarity coefficients have a set of limitations. Firstly, if a 
group is to be formed on the basis of overall matches, two ob- 
jects may not be grouped even if they match well on some 
subset of characteristics. Secondly, if a large number of 
characteristics are involved, objects which match may do so 
for accidental reason, reflecting the noise in the data. Thirdly, if 
some variables are dichotomus and others are multichotomous, 
the two state attributes will tend to be more heavily weighted 
in the similarity measures. Finally, if continuous data are 
discretised in order to use similarity measures, valuable infor- 
mation can be lost. The analyst thus has the problem of 
deciding the kinds of attributes to include and the number of 
States to be associated with each. 

Scattergram will generally give an idea of grouping, lineari- 
ty and heteroscedasticity. 

6.3 Key-Cluster Factoring: The objective of key-cluster 
factoring is 1) to select mutually collinear variables defining 
each of the clusters, 2) On the basis of the proportion of com- 
munalities accountable from the scores on the dimensions to 
select minimal or salient k classes and 3) to provide information 


AG more clusters than minimally sufficient which one to 
elete. 


There are five criteria to keep in mind when assessing the 
results of the initial factoring procedure: 
(1) the degree of collinearity of the definers of each 
dimension, 
(2) the degree of independence of the oblique cluster 
that defines each dimension 
(3) the meaning of each defining oblique cluster as a 
construcy, 
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(4) the contribution of the definers to the reliability of a 
cluster score (which is the sum of Z scores of the 
variables in the cluster) on the oblique cluster. 

and (5) the generality of each oblique cluster and of the 
variables that define it. 
In case of graphical method the nearness of variables is us- 


ed in selecting the variables. Otherwise one should use the in- 
dex of collinearity P2 (1, d) 


Two variables V4 and Vz are said to be perfectly collinear if 
the correlations variable V} has with all other variables (Уз, 
Уа... Ма) is а perfect ratio of the correlations V2 has with all 
the same variables (Мз, үд, .....V,). 

If V4 and Уә are perfectly collinear 

"ii 

>=. = c fori = 34. п 
г21 

огг = C T2; 


* where T1; and r2; are the correlations variables 1 and 2 have 
with variables i. Then the Tyron's index of collinearity P2 is 


n 


n n 
(27) (Біл! 
і-з 1=3 
= | if perfectly collinear 


Thus the closer P2 is to one, the more collinear the 
variables are. 

The cut off rule for finding clusters is, if less than 5% of the 
estimated overall communalities is accounted by the new 
cluster, stop finding new clusters. How many clusters to form 
also depends on the knowledge of the analyst of the particular 
project and the meaningful interpretation he/she can give to the 
clusters. 

Key clustering being near natural grouping should be used 
over varimax, principal axes methods, (see Chapters 4 and 5). 


e graphic solution may indi i i 
indi 
n | y ndicate the variables of oblique 
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To illustrate a feature analysis model of pattern recogni- 
tion, Lindsay and Norman in their book ‘Human Information 
Processing' analysed the upper-case letters of the alphabet in 
terms of a likely set of features. 

These seven features were: 

(1) Vertical lines, (2) Horizontal lines, (3) Oblique lines; 

(4) Right angles, (5) Oblique angles, (6) Closed curves, 

(7) Open curves. 

The attributes of the letter ‘A’ were thus one horizontal 
line, two obliqde lines and three oblique angles and the letter 
'Q', one oblique line, two oblique angles and one closed curve. 
Of course, this feature list, used mainly for purposes of illustra- 
tion, seemed plausible as did many others but usually no 
evidence was presented to suggest that our visual system did 
analyse input in the way suggested. 

One way of assessing the adequacy of the above feature 
list would be to carry out a Cluster Analysis to identify 
similarities and differences amongst letters and to assess the 
extent to which the resulting clusters made sense. If the 
clusters seemed reasonable, then this, perhaps, might be 
regarded as a weak support of the model. 


If not, then: 
(a) the feature model itself might be inappropriate; 
(b) ап inappropriate set of features might have been 


chosen; М 
(с) the appeal to intuition might not be valid; 
(d) clustering might not be a suitable technique for analys- 


ing these data. 


The Furthest Neighbour Method gives 
АШКЕ D E F ОН ЇШК ИМ NO E a RIS ТО XS 2 
cluster 
VY 41359 1197 1010536 
e distance between the remain- 


4451049676525 


Here, the larger value defines th j'ai 
ing clusters. In the nearest neighbour method clusters are join- 


ed in terms of the nearest entity in are cluster to another — the 
single link — while in the furthest neighbour method all entities 
in one cluster are linked to another. The linkage is therefore 
complete. 

With the seven feature data, the minimum distance or 
nearest neighbour method runs into problems because the 
distances at which the larger clusters are combined is the same 
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as the distances at which the smaller one were joined. This 
method forces the between clusters distances down very 
rapidly. 


The three clusters solution is: 


(1n) AKQVWXYZ 
(2 BCDEFGHIJLOPRSTU 
(3) MN 


This agglomeration is illustrated in the rather trange look- 
ing DENDOGRAM below. : 
0 
Dendrogram using Single Linkage 


RESCALED DISTANCE CLUSTER COMBINE 


CASE o 5 10 15 20 25 
Label Seq %------1.Т Т 00-0 
(el 
25 $$ — + 
24 —- I 


ПЕ: Jl 
26 Se} 
23 اہ‎ 
13 -- —+ 
14 —— І 
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21---- I 
3 — I I 
19 = І I 
15-------------- 1 
12 + 1 I 
22------- 
7— — d 
16 + I 
18-к І 
6 І 
gi I 
4 Mi 
br I 
ве ү 
2 ——+ 


Note that in the dendogram the distance units 1-3 are rescaled to 0-25. 
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The cluster membership using complete linkage is 
(A,K,Q, V, W,X,Y,Z) (B,C,D,E,F,G,H,1,J,L,0,P,R,S,T,U) (M,N) 


Dendrogram using Complete Linkage 


RESCALED DISTANCE CLUSTER COMBINE 


CASE 0 5 10 15 20 25 
Label Seq  -—— — —31— — —* —À + —- 
22 بم‎ 


ынан інін ы + 


| 


©. 
10). == 
ZINC dE 
3 ++ 
19 =+ 


Some of the furthest neighbour groups seem wrong — a 
better classification might be 


(A,K,M,N,V,W,X,Y,Z) (E,F,H,I,L, T) (B,C,D,G,J,P,R,S,U) (0,0) 


The features — oblique angles and right angles — force a 
counter intuitive classification. Of course, if it happened that 
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people confused C's and T's more than they confused T's and 
l's then the furthest neighbour classification might have some 
merit. 

Quite probably,the sum of features not shared by two letters 
does not provide an adequate distance metric but, on the other 
hand, the clustering does illustrate both some problems with 
the feature list and some of the choices that have to be made 
when using cluster analysis. 


Some final points 


(1) If the data are on an interval or near interval scale or 
better, then consider taking the defaults-squared 
Euclidian distances and between clusters averages. 

(2) For binary data generate a dissimilarity matrix and, 
perhaps for count data where the dissimilarity matrix 
is based on X2. 


у 
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1. Automatic Interaction Detector 


Automatic Interaction Detector (AID) is a computer pro- 
gramme which operates under the University of Michigan Ex- 
ecutive System and was extensively developed by Morgan and 
Sonquist (1, 2). It is focussed on a particular kind of data- 
analysis problem, characteristic of many social science 
research situations, in which the purpose of the analysis in- 
volves more than the reporting of descriptive statistics, but 
may not necessarily involve the exact testing of specific 
hypotheses. In this type of situation the problem is often one of 
determining which of the variables, for which data have been 
collected, are related to the phenomenon in question, under 
what conditions, and through what intervening processes, with 
appropriate controls for spuriousness. 


The data-model to which the procedure is applicable may 
be termed a ''sample survey model'', in which values of a set of 
predictors X4, X2,.....X,, and a dependent variable Y, have 
been obtained over a set of observations. In particular this 
analysis situation is defined to be one in which the X; are a mix- 
ture of nominal and/or ordinal scales and Y is a continuous, or 
equal interval scale. 


7.1 The AID Technique: Regarding one of the variables as 
a dependent variable, the analysis employs a nonsymmetricl 
branching process, based on variance analysis techniques, to 
subdivide the sample into a series of subgroups which max- 
imise one's ability to predict values of the dependent variable. 
Linearity and additivity assumptions inherent in conventional 
multiple regression techniques are not required. 


In actual operation, the program works as follows: 


1. The total input sample is considered the first (and indeed 
only) group at the start. 


n 
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2. Select that unsplit sample group, group i, which has the 
largest total sums of squares. 


N; N; 2 
155,- p Ү2-( E Ye a) 
a-1 a=1 
Nj 
such that for the i'th group 
TSS; > „В (Т55т) and Nj > М (2) 


where В is an arbitrary parameter (normally .01 < R <.10) 
and M is an arbitrary integer (normally 20 < S < 40). 


The requirement (2) is made to prevent groups with little 
variation in them, or small number of observations, or both, 
from being split. That group with the largest total sum of 
squares (around its own mean) is selected, provided that this 
quantity is larger than a specified fraction of the original total 
sum of squares (around the grand mean), and that this group 
contains more than some minimum number of cases (so that 
any further splits will be credible and have some sampling 
stability as well as reducing the error variance in the sample). 


3. Find the division of the C, classes of any single predictor 


Хұ such that combining classes to form the partition p of this 
group i into two non-overlapping subgroups on this basis pro- 
vides the largest reduction in the unexplained sum of squares: 


Thus, choose a partition so as to maximise the expression 


(yr + пау2-2) —Ni A = BSSip ' (3) 


where N; = n4 + n2 ` 


and Yi SoMa n2y2 

Ni 
for group i over all possible binary splits on all predictors, with 
restrictions that (a) the classes of each predictor are ordered in- 
to a descending sequence, using their means as.a key and (b) 
observations belonging to classes which are not continuous 
(after sorting) are not placed together in one of the new groups 
to be formed. Restriction (a) may be removed, by option, for 
any predictor Хү. 
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4. For a partition p on variable k over group i to take place. 
after the completion of step 3, it is required that 


BSSikp > 01755; 4) 


where О is ап arbitrary parameter іп the range .001<Q<R, 
and TSS, is the total sum of squares for the input sample. 
Otherwise group i is not capable of being split; that is, no 
variable is "useful" in reducing the predictive error in this 
group. The next most promising group (TSS; = maximum) is 
selected via step 2 and step 3 is then applied to it, etc. 


5. If there are no more unsplit groups such that require- 
ment (2) is met, or if, for those groups meeting it, requirement 
(4) is not met (i.e. there is no “useful” predictor), or if the 
number of currently unsplit groups exceeds a specified input 
parameter, the process terminates. 

7.2 Examples: In predicting Income (2) Age, Race, Educa- 
tion, Occupation and Length of Time in present jobs are used. 
Age is an ordered series of categories represented by the 
numbers (1, 2,...6). Race is coded (1 or 2), Occupation is cod- 
ed (1, 2,....5), Education is coded (1, 2, 3) and Time on Job is 
‘coded (1, 2,...,5). We find the following mutually exclusive 
"groups whose means may be used to predict the income of 
observations falling into that group: 

PPAR‏ س 


Group Type N Mean 
Income © 

SS 2 

12 Age 46-65, white, college 8 $8777 $773 


13 Age under 45, white, college | 12 6005 812 
10 Age 36-65 white, no college 24 5794 487 


non laborer 

11 Age under 35, white, 16 3752 559 
no college, non-laborer 

9 Age under 65, white, 10 2750 250 
no college, laborer 

5 Age under 65, nonwhite 10 2010 10 

3 Age over 65 10 1005 5 


Total 90 4434 2263 
Qn ceni IN 


A one-way analysis of variance over these seven groups 
Would account for 95 per cent of the variation in income. These 


results are arrived at by the following procedure, as 
represented by the tree of binary splits: 
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TLE = A 

9L=N 
Е Y3QNNIOV 
(1) 


LL6t =A 

ОР= М 
HHO 
-NON (8) 


8008 =A 

cL =N 
Фу нзамп 
35v (E1) 


LLL8 =A 

8-N 
99-97 39v 
(21) 


ZE = A 

04-М 
3931102 
ON(Z) 


pLIL=A 

02-М 
3531102 
(9) 


OLOZ =A 
OLEN 
зинм 


- NON (8) 


9001 =A 
01 =й 
59 HAO 


35v (Е) 


very =A 
06-М 
JIdWYS 


1у101 (1) 
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When the total sample (group 1) is examined, the max- 
imum reduction in the unexplained sum of squares is obtained 
by splitting the sample into two new groups, ''age under 65” 
(classes 1-5 on age) and ““аде 65 and over’’) (those coded 6 on 
age). Note that each group may contain some nonwhites and 
varying education and occupation, groups. Group 2, the 
^under-65' people, are then split into "white" and ''non- 
white’’. Note that group 5, the ‘‘nonwhites’’, are all under age 
65. Similarly, the “white, under age 65'' group is further divid- 
ed into college and noncollege individuals etc. A group which 
can no longer be split is marked with an asterisk and con- 
stitutes one of the above final groups. The variable "Length of 
Time in Present Job'' has not been used. At each step there ex- 
isted another variable which proved more useful in explaining 
the variance remaining in that particular group. 

7.3 Limitations and Applications: There are several limita- 
tions in using AID (3) Data sets with a thousand cases or more 
are necessary, otherwise the power of the search process must 
be restricted drastically or those processes will carry one into a 
never-never land of idiosyncratic results (4) A well-behaved 
dependent variable without extreme cases of severe 
bimodalities is also assumed. А dichotomous dependent 
"'variable'' is usable if it takes on of its values more than 20 and 
less than 80 per cent of time. The predictors should be 
classifications, where each of the classes is in a single dimen- 
sion; otherwise one really should make dichotomies out of each 
of the categories. Finally, some theory must be applied, if only 
in the selection of the predictors. = 

In the recent years, AID has been used in marketing. In one 
such analysis, Heald (5) has studied the factors which in- 
fluence the turnover of the outlets. The technique also has 
been used in assessing the store performance and site selection 
(6), segmenting the markets by Group Purchasing behaviour 
(7), and constructing marketing indicators (8). 
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8. Non-Metric Multi-Dimensional Scaling 
(NMS) 


Numerous qualitative approaches (1) are used to study the 
attitude of potential consumers to brands and advertising cam- 
paigns. The technique of NMS is simple. 

The basic idea of the method (2) is that one can determine 
how consumers view competing products or brands without 
asking them complicated questions requiring numerical ratings 
along various scales deemed of importance. Instead a virtually 
complete picture can be built up by asking respondents simply 
to rank pairs of products in the order of overall similarity. 

For example, a respondent would be given a pile of cards. 
On each card would be the names (or descriptions) of two pro- 
ducts from the assortment under consideration. She is then 
shown how to arrange the cards in her order of decreasing 
similarity, so that the pair at the bottom will be the least similar 
and the pair at the top the most alike. These ordered cards are 
then fed into the programme and the resulting output is a con- 
figuration of points (products) in multidimensional (usually two 
or three) space. The distances betwen the products should 


% | efficiency 


Figure 1 
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consistent with the rankings given by the respondent. 
Tn ОАЫ brands А, В, C, D are ranked so that A and с аге n 
most similar and B and D the least, the configuration might loo 
like figure 1. Furthermore, the dimensions will usually be inter- 
pretable. Here B is shown to be seen as an 'efficient' product 


whereas D is is low on ‘efficiency’ but seen as high on the ‘lux- 
ury' dimension. 


Spatial configurations are reproduced from only ranked 
data and the ranks act as constraints which greatly limit 
where the alogrithm can place the points. 


8.1 Advantages of NMS: Compared to the traditional 
methods, NMS has some appealing features (3) 
(1) Only ranked data are required. 


— Give more precise information without sacrific- 
ing usability 


Factors are not pre-specified 
— No pre-structuring of questions 
Considers all relevant dimensions 


— Unlike most attitude scaling treats perceptions of 
consumers as multi-dimensional 
(4) Differences in views: 


— Allows segmenting the respondents 
(5) Incorporating Preferences: 


— Incorporates perception with preference 
Finally, ы 
(6) 


(2) 
B 


Handling missing data: 
— Allows Sighting missing data without biasing 
other relationships 
= Remaining data sufficient to constrain the con- 
figuration. 

8.2 Application of the Technique: NMS 
ed in market segmentation analysis (2,3 
Studies and also for test marketing, 
Spaces (4). 


is successfully us- 
) and new product 
developing perceptual 


In a Study for market for conv 
the objective was to know how many dimensions were impor- 
tant to Consumers, what these di і 


The essential d 


; 4 ata were collected + 
housewives in the ta 


гот a sample of 
Tget group. Each was P 


given a set of cards 
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containing the names of all pairs of eight products under study 
and shown how to sort them in the order of decreasing overall 
similarity. How similarity was defined was left open to the 
respondent. Next, the housewives ordered the brands in terms 
of preference. Finally, information was collected about usage 
and attitudes from a conventional questionaire. 


First, two points of view were distinguished — one group 
being heavy users and the other light users. Both groups were 
then analysed using the NMS programme. Three dimensions 
appeared to Satisfactorily describe the Perceptions of con- 
Sumers; these dimensions appeared to be ‘nutgition’ ‘value for 
money’ and ‘substantiality’. The relative Position of the pro- 
ducts for the heavy user group using two of the dimensions is 
shown in the diagram. Finally, the Preference data to calculate 
an average ‘ideal point’ was used. 

However like all other techniques there are limitations to 
use of NMS. First stems from the computational side. How uni- 
que are the attribute spaces given noisy and/or missing data? 
How reliable, Statistically, are the solutions? Green (5) and Per- 
cy (6) have studied this problem and have suggested certain 
improvements to tackle it. Secondly, there is the Problem con- 
cerning distance measurement. Secondly, there is the problem 
Concerning distance measurement. There are other distance 
measures, apart from the Euclidean distance. These measures 
may give drastically different interpretations. However, NMS 
used with proper discrimination offers a Possibility of new in- 
Sights into analysis of market behaviour. 


+ | ‘Value’ 
FBe 
F&Ce 
“ЕР 
FCS 
‘Nutrition’ 
FB-frozen beefb 
n urgers 
5» e FFF F&C-fish and hips 
LC. ҒС5-ітогеп Cod steaks 
FP-fish plaice 
ECP. FFF-frozen fish fingers 


C-Lamp chops 
Ж FCP-frozen chicken pie 
S-pork sausages 
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APPENDIX А 


ALL MULTIVARIATE 
METHODS 


ARE 
SOME OF THE 
VARIABLES DEPENDENT, 
ON OTHERS? 


DEPENDENCE INTERDEPENDENCE 
METHODS METHODS 


VARIABLES ARE 


FACTOR | CLU ае 
ANALYSIS | ANALYSIS 


METRIC 
MULTID!MENSIONAL 
SCALING 


MULTIVARIATE 

ANALYSIS 
OF 

VARIANCE 


LATENT 
STRUCTURE 
ANALYSIS 


NONMETRIC 
SCALING 


MULTIPLE 
DISCRIMINANT 
ANALYSIS 


CANONICAL 
ANALYSIS 


AUTOMATIC 
INTERACTION 
DETECTOR 
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APPENDIX B 


Linearizable functions with corresponding transformations 


= ĖĖ—MM 


Function Transformation Linear form Graph shown 
in figure 
у = ax y’ = logy, x’ = logx у’ = loga + Rx’ a,b 
У = ав у’ = 1пу y'= Ina + Вх c,d 
© 
у = а + Віодх x’ = log x У = a + Rx’ e,f 
с) х > 1 1 n 
= ауле RE E E j 
ax-B Y У х Y oo Bx g 
ga t Bx 
= ' 2 1n( — ‘= 
DEW y -) y a + Rx i 
1+е 
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APPENDIX B 
Graphs of Linearizeable Functions 


(<,8 хай>0) Y (“,Х>0,8<0) 
0<в< * 


N 


le) (f) 
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Y (8 >0) 


NEGATIVE CURVATURE 
(h) 
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APPENDIX C 


Transformations to Stabilise Variance 
——————————— 


Probability Variance Transformation Resulting 
distribution y in terms variance 
of of its 
variable mean p 
y 
pt o o AES 
Poisson e Vy or (y Ny 1) 0.25 
(1-4) a 821 
Binomial Е Sin vy (degrees) E 
n 
-1 
Р 0.2 
біп yy (radians) oe 
Negative +)2m2 d-1sSinh-1(AVy) or 0.25 
Binomial E х-1біпһ-1 (Avy + 0.5) 


—————-—-—— 
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